21 research outputs found
Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection
The traditional Web is evolving into the Web of Data which consists of huge collections
of structured data over poorly controlled distributed data sources. Live
queries are needed to get current information out of this global data space. In live
query processing, source selection deserves attention since it allows us to identify the
sources which might likely contain the relevant data. The thesis proposes a source
selection technique in the context of live query processing on Linked Open Data,
which takes into account the context of the request and the quality of data contained in
the sources to enhance the relevance (since the context enables a better interpretation
of the request) and the quality of the answers (which will be obtained by processing
the request on the selected sources). Specifically, the thesis proposes an extension of
the QTree indexing structure that had been proposed as a data summary to support
source selection based on source content, to take into account quality and contextual
information. With reference to a specific case study, the thesis also contributes an approach,
relying on the Luzzu framework, to assess the quality of a source with respect
to for a given context (according to different quality dimensions). An experimental
evaluation of the proposed techniques is also provide
Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques
In recent years we have seen a proliferation of Linked Open Data (LOD) compliant datasets becoming available on the web, leading to an increased number of opportunities for data consumers to build smarter applications which integrate data coming from disparate sources. However, often the integration is not easily achievable since it requires discovering and expressing associations across heterogeneous data sets. The goal of this work is to increase the discoverability and reusability of the scholarly data by integrating them to highly interlinked datasets in the LOD cloud. In order to do so we applied techniques that a) improve the identity resolution across these two sources using Link Discovery for the structured data (i.e. by annotating Springer Nature (SN) SciGraph entities with links to DBpedia entities), and b) enriching SN SciGraph unstructured text content (document abstracts) with links to DBpedia entities using Named Entity Recognition (NER). We published the results of this work using standard vocabularies and provided an interactive exploration tool which presents the discovered links w.r.t. the breadth and depth of the DBpedia classes
Context Aware Source Selection for Linked Data
The traditional Web is evolving into the Web of Data, which gathers huge collections of structured data over distributed, heterogeneous data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection allows the identification of the sources that most likely contain relevant content. Due to the semantic heterogeneity of the Web of Data, however, it is not always easy to assess relevancy. Context information might help in interpreting the user\u2019s information needs. In this paper, we discuss how context information can be exploited to improve source selection
LinkedDataOps: linked data operations based on quality process cycle
This paper describes three new Geospatial Linked Data
(GLD) quality metrics that help evaluate conformance to standards.
Standards conformance is a key quality criteria, for example for FAIR
data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets
that showed a wide variation in standards conformance. This is the first
set of Linked Data quality metrics developed specifically for GLD
Distribution and occurrence of microsporidian pathogens of the willow flea beetle, Crepidodera aurata (Coleoptera: Chrysomelidae) in North Turkey
In this study, microsporidian pathogens in Crepidodera aurata populations were investigated. Totally 1,728 C. aurata adults were examined for microsporidian pathogens and 78 of them were found to be infected. Two species of microsporidia; Microsporidium sp.1 and Microsporidium sp.2 were observed in the C. aurata populations from ten localities in North Turkey. They show considerable difference from each other in the spore morphology and dimension, infection rate and host locality. The spores of Microsporidium sp.1 were oval in shape and measured from 3.66 to 5.66 µm in length and from 1.35 to 2.22 µm in width (n=50). The spores of Microsporidium sp. 2 were slightly curled and measured from 2.44 to 3.55 µm in length and from 1.25 to 1.55 µm in width (n=50). These microsporidia were recorded from C. aurata for the first time. Here we present occurrence and distribution of two microsporidia in C. aurata populations as natural potentially suppressing factors
Quality metrics to measure the standards conformance of geospatial linked data
This paper describes three new Geospatial Linked Data
(GLD) quality metrics that help evaluate conformance to standards.
Standards conformance is a key quality criteria, for example for FAIR
data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets
that showed a wide variation in standards conformance. This is the first
set of Linked Data quality metrics developed specifically for GL
A SKOS taxonomy of the UN global geospatial information management data theme
Complex data domains increase the difficulty of structuring, sharing, discovering and governing information. For the geospatial domain common models such as INSPIRE have been established in the European Union. The United Nations initiative on Global Geospatial Information Management (UN-GGIM) draws together national and regional capacities. Interoperability is the main principle behind these initiatives. Nonetheless there is a lack of published research to date on mapping agency geospatial linked data leveraging the UN-GGIM taxonomy of information management data themes. Thus, we have identified use cases and defined a Simple Knowledge Organization System (SKOS)\footnote{\url{https://www.w3.org/TR/skos-reference/}} taxonomy expressing the UN GGIM data themes for national spatial infrastructure. This has been applied in a metadata generation and reporting tool for Ordnance Survey Ireland (OSi) which underpinned improved governance and reporting infrastructure in OSi. This demonstrated the contribution of Semantic Web technology to spatial data governance as well as its importance for data publishing. This paper presents a documented open license SKOS taxonomy for the UN GGIM data themes that follows Linked Data best practices. It provides a set of three use cases, an overview of UN-GGIM theme definitions and an example application of the taxonomy for deployment in OSi for DCAT metadata generation and data publishing pipeline reporting
Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying
Objectives This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries.Methods Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis.Results A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%–100% to 60%–100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%–91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively.Conclusions In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings
Recommended from our members
Results of the ontology alignment evaluation initiative 2020
The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2020 campaign offered 12 tracks with 36 test cases, and was attended by 19 participants. This paper is an overall presentation of that campaign